399 research outputs found
Macro-evolutionary models and coalescent point processes: The shape and probability of reconstructed phylogenies
Forward-time models of diversification (i.e., speciation and extinction)
produce phylogenetic trees that grow "vertically" as time goes by. Pruning the
extinct lineages out of such trees leads to natural models for reconstructed
trees (i.e., phylogenies of extant species). Alternatively, reconstructed trees
can be modelled by coalescent point processes (CPP), where trees grow
"horizontally" by the sequential addition of vertical edges. Each new edge
starts at some random speciation time and ends at the present time; speciation
times are drawn from the same distribution independently. CPP lead to extremely
fast computation of tree likelihoods and simulation of reconstructed trees.
Their topology always follows the uniform distribution on ranked tree shapes
(URT). We characterize which forward-time models lead to URT reconstructed
trees and among these, which lead to CPP reconstructed trees. We show that for
any "asymmetric" diversification model in which speciation rates only depend on
time and extinction rates only depend on time and on a non-heritable trait
(e.g., age), the reconstructed tree is CPP, even if extant species are
incompletely sampled. If rates additionally depend on the number of species,
the reconstructed tree is (only) URT (but not CPP). We characterize the common
distribution of speciation times in the CPP description, and discuss incomplete
species sampling as well as three special model cases in detail: 1) extinction
rate does not depend on a trait; 2) rates do not depend on time; 3) mass
extinctions may happen additionally at certain points in the past
Bayesian phylogenetic estimation of fossil ages
Recent advances have allowed for both morphological fossil evidence and
molecular sequences to be integrated into a single combined inference of
divergence dates under the rule of Bayesian probability. In particular the
fossilized birth-death tree prior and the Lewis-Mk model of discrete
morphological evolution allow for the estimation of both divergence times and
phylogenetic relationships between fossil and extant taxa. We exploit this
statistical framework to investigate the internal consistency of these models
by producing phylogenetic estimates of the age of each fossil in turn, within
two rich and well-characterized data sets of fossil and extant species
(penguins and canids). We find that the estimation accuracy of fossil ages is
generally high with credible intervals seldom excluding the true age and median
relative error in the two data sets of 5.7% and 13.2% respectively. The median
relative standard error (RSD) was 9.2% and 7.2% respectively, suggesting good
precision, although with some outliers. In fact in the two data sets we analyze
the phylogenetic estimates of fossil age is on average < 2 My from the midpoint
age of the geological strata from which it was excavated. The high level of
internal consistency found in our analyses suggests that the Bayesian
statistical model employed is an adequate fit for both the geological and
morphological data, and provides evidence from real data that the framework
used can accurately model the evolution of discrete morphological traits coded
from fossil and extant taxa. We anticipate that this approach will have diverse
applications beyond divergence time dating, including dating fossils that are
temporally unconstrained, testing of the "morphological clock", and for
uncovering potential model misspecification and/or data errors when
controversial phylogenetic hypotheses are obtained based on combined divergence
dating analyses.Comment: 28 pages, 8 figure
A polynomial time algorithm for calculating the probability of a ranked gene tree given a species tree
In this paper, we provide a polynomial time algorithm to calculate the
probability of a {\it ranked} gene tree topology for a given species tree,
where a ranked tree topology is a tree topology with the internal vertices
being ordered. The probability of a gene tree topology can thus be calculated
in polynomial time if the number of orderings of the internal vertices is a
polynomial number. However, the complexity of calculating the probability of a
gene tree topology with an exponential number of rankings for a given species
tree remains unknown
Simulating Trees with a Fixed Number of Extant Species
In this paper, I develop efficient tools to simulate trees with a fixed number of extant species. The tools are provided in my open source R-package TreeSim available on CRAN. The new model presented here is a constant rate birth-death process with mass extinction and/or rate shift events at arbitrarily fixed times 1) before the present or 2) after the origin. The simulation approach for case (2) can also be used to simulate under more general models with fixed events after the origin. I use the developed simulation tools for showing that a mass extinction event cannot be distinguished from a model with constant speciation and extinction rates interrupted by a phase of stasis based on trees consisting of only extant species. However, once we distinguish between mass extinction and period of stasis based on paleontological data, fast simulations of trees with a fixed number of species allow inference of speciation and extinction rates using approximate Bayesian computation and allow for robustness analysis once maximum likelihood parameter estimations are availabl
Phylogenetic analysis accounting for age-dependent death and sampling with applications to epidemics
The reconstruction of phylogenetic trees based on viral genetic sequence data
sequentially sampled from an epidemic provides estimates of the past
transmission dynamics, by fitting epidemiological models to these trees. To our
knowledge, none of the epidemiological models currently used in phylogenetics
can account for recovery rates and sampling rates dependent on the time elapsed
since transmission.
Here we introduce an epidemiological model where infectives leave the
epidemic, either by recovery or sampling, after some random time which may
follow an arbitrary distribution.
We derive an expression for the likelihood of the phylogenetic tree of
sampled infectives under our general epidemiological model. The analytic
concept developed in this paper will facilitate inference of past
epidemiological dynamics and provide an analytical framework for performing
very efficient simulations of phylogenetic trees under our model. The main idea
of our analytic study is that the non-Markovian epidemiological model giving
rise to phylogenetic trees growing vertically as time goes by, can be
represented by a Markovian "coalescent point process" growing horizontally by
the sequential addition of pairs of coalescence and sampling times.
As examples, we discuss two special cases of our general model, namely an
application to influenza and an application to HIV. Though phrased in
epidemiological terms, our framework can also be used for instance to fit
macroevolutionary models to phylogenies of extant and extinct species,
accounting for general species lifetime distributions.Comment: 30 pages, 2 figure
Bayesian inference of sampled ancestor trees for epidemiology and fossil calibration
Phylogenetic analyses which include fossils or molecular sequences that are
sampled through time require models that allow one sample to be a direct
ancestor of another sample. As previously available phylogenetic inference
tools assume that all samples are tips, they do not allow for this possibility.
We have developed and implemented a Bayesian Markov Chain Monte Carlo (MCMC)
algorithm to infer what we call sampled ancestor trees, that is, trees in which
sampled individuals can be direct ancestors of other sampled individuals. We
use a family of birth-death models where individuals may remain in the tree
process after the sampling, in particular we extend the birth-death skyline
model [Stadler et al, 2013] to sampled ancestor trees. This method allows the
detection of sampled ancestors as well as estimation of the probability that an
individual will be removed from the process when it is sampled. We show that
sampled ancestor birth-death models where all samples come from different time
points are non-identifiable and thus require one parameter to be known in order
to infer other parameters. We apply this method to epidemiological data, where
the possibility of sampled ancestors enables us to identify individuals that
infected other individuals after being sampled and to infer fundamental
epidemiological parameters. We also apply the method to infer divergence times
and diversification rates when fossils are included among the species samples,
so that fossilisation events are modelled as a part of the tree branching
process. Such modelling has many advantages as argued in literature. The
sampler is available as an open-source BEAST2 package
(https://github.com/gavryushkina/sampled-ancestors).Comment: 34 pages (including Supporting Information), 8 figures, 1 table. Part
of the work presented at Epidemics 2013 and The 18th Annual New Zealand
Phylogenomics Meeting, 201
- …